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PROCESS AND SYSTEM FOR THE SECURE DISTRIBUTION 
OF COMPRESSED DIGITAL TEXTS 



[0001] The present invention relates to the area of binary data from transformations applied 
to digital texts. 

[0002] The present invention proposes to supply a system that permits the protection and the 
distribution in a secure manner of compressed digital texts and the restitution of the original 
digital text while preventing a non-authorized use of or access to these compressed digital texts. 
[0003] In the following the term "text" defines a succession of characters from an alphabet of 
letters or numbers and of punctuation signs. 

[0004] In the following the term "digital text" defines the succession of bytes representing 
characters from an alphabet and/or pimctuation signs and/or data for formatting and displaying a 
text on a viewing screen. 

[0005] In the following the term "compressed digital text" defines the binary data stream 
from an algorithm of statistical compression applied to the digital text. 

[0006] In the following the action of displaying a compressed digital text is defined as the 
series of operations consisting in reading and decoding the succession of binary data that 
constitutes the compressed digital text for restituting the text on a viewing screen in order that it 
can be read and imderstood from a semantic viewpoint by a human being. 
[0007] The present invention relates more particularly to an apparatus capable of 
transmitting in a secure manner a set of compressed digital texts to a viewing screen and/or for 
being recorded on the hard disk of a computer or on the recording support of a box connecting 
the telecommunication network to a viewing screen such as a television screen or a personal 
computer monitor while avoiding any fraudulent use such as the possibility of making illicit 



copies of textual contents or of compressed digital texts. The invention also relates to a client- 
server system between the server that furnishes the secure compressed digital texts and the client 
that displays, reads, records or prints the compressed digital texts. 

[0008] It is possible with the current solutions to transmit volimiinous texts and documents in 
digital form via telecommunication networks of the cable, DSL (Digital Subscriber Line) or BLR 
(Local Radio Loop {B = FR boucle = loop}) type. Furthermore, in order to avoid the pirating of 
works and confidential documents broadcast in this manner, the latter are fi'equently encrypted or 
scrambled by various means well known to an expert in the art. 

[0009] As concems the secure distribution of texts and binary data, the prior art contains the 
document WO9805142 "Multi-Matrix Encryption for Private Transmission of Data" presenting a 
process and equipment for the protection of data and their secure transmission through an 
electronic network. The document concems the encryption of textual data with the aid of 
matrices of ASCII characters generated by keys. The three key elements at the input are a PIN 
code (Personal Identification Number), the number of the bank accoimt of the user and a 
password. These three keys initiate the generation of a matrix A and of a matrix B. Matrices A 
and B are generated in a pseudorandom manner with the aid of an analytic fiinction such as a 
logarithmic fimction, a trigonometric fiinction, a square root fimction or the like. The 
distribution of the characters in matrices A and B is irregular and each character is unique. Three 
integrity control values are calculated and incorporated in the protected stream, one of which 
represents the sum of the input textual data and the two others are relative to the three input key 
elements. The input data is transformed into a decimal value with four numbers by operations of 
permutation, addition, subtraction, multiplication, division and they are then divided into two 
values of two numbers. These two values are indexed relative to the elements of matrices A and 
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B in order to form the stream of protected data. However, due to the division into two parts for 
the indexation the size of the protected stream increases considerably relative to the size of the 
initial data. Moreover, all the protected data as well as the three control values generated are 
present in the protected data. Therefore, this prior art does not correspond to the criteria of high 
security, the goal of the present invention. 

[0010] The protection of compressed digital texts realized in a manner in conformity with the 
present invention is based on the principle of the principle of the deletion and the replacement of 
certain information coding the original compressed digital texts by any method, e.g.: 
Substitution, modification, permutation or shifting of the information. This protection is also 
based on a knowledge of the structure of the binary data at the output of the encoder producing 
the compressed digital texts. 

[0011] The present invention concems the general principle of a process for securing 
compressed digital texts. The solution consists in extracting and permanently preserving, in a 
location that cannot be accessed by the user, in fact, in the distribution network, a part of the 
compressed digital text recorded at the client's or sent online, which part is of prime importance 
for exploiting this compressed digital text on a display screen but has a very low volume relative 
to the total volume of the compressed digital text recorded at the user's or received online. The 
lacking part will be transmitted via the distribution network at the moment of the exploitation of 
this compressed digital text. 

[0012] As the compressed digital text is separated into two non-equal parts, the larger part of 
the compressed digital text is called the "modified compressed digital text" and is therefore 
transmitted via a classic broadband or narrowband broadcasting network whereas the lacking 
part called the "complementary information" is sent on demand via a narrowband 
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telecommunication network such as the classic telephone networks or cellular networks of the 
GSM, GPRS, or UMTS type or by using a small part of a network of the DSL or BLR type, or by 
using a subset of the bandwidth shared on a cable network, or also via a physical support such as 
a memory card or any other support. The two networks can advantageously be combined while 
retaining the two separate transmission paths. The original compressed digital text is 
reconstituted on the equipment of the addressee by a synthesis module from the modified 
compressed digital text and the complementary information. 

[0013] In order to implement the process the invention realizes a protection system 
comprising an analysis and protection module and a recomposition module that are based on a 
digital format stemming from the encoding of a digital text using statistical compression 
algorithms. The analysis and protection module proposed by the invention is based on the 
substitution by "decoys" or on the modification of part of the binary data composing the original 
compressed digital text. The fact of having removed and substituted a part of the original data of 
the original compressed digital text during the generation of the modified compressed digital text 
does not permit the recomposition of this original compressed digital text from only the data of 
this modified compressed digital text. 

[0014] Based on the characteristics of the compressed digital text, several variants of the 
protection process are implemented and are illustrated with the exemplary embodiments. 
[0015] The present invention relates more particularly to an apparatus capable of 
transmitting in a secure manner a digital text to a display device and/or for being recorded in the 
memory of the backup apparatus of a box connecting the telecommunication network to the 
display device while preserving the semantic content of the text but avoiding the possibility that 
the digital text could be read and copied illicitly. 
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[0016] A compressed digital text generated by a statistical compression algorithm from a 
digital text is constituted by a succession of binary data representing codes and or entries in 
coding tables and/or pointers to the positions in the digital text. 

[0017] The present invention consists, after the analysis of the compressed digital text, in 
extracting at least one original binary data in the compressed digital text which original binary 
data represents a code or an entry in a coding table or a pointer, which data is randomly selected, 
and in replacing it by a binary data called a decoy of the same size and of the same nature but 
with a random value in order to generate a compressed digital text in conformity with the format 
of the original compressed digital text. The displaying of the modified compressed digital text 
then restores a text that is illegible and/or incomprehensible from a semantic viewpoint for a 
human being. 

[0018] According to a variant of the invention the original binary data to be extracted is 
selected in a deterministic manner. 

[0019] According to another variant of the invention the value of the decoy binary data is 
calculated in a deterministic manner. 

[0020] According to another variant of the invention the decoy binary data has a size 
different from the size of the original binary data. 

[0021] The invention concerns according to its most general meaning a process for the 
secure distribution of compressed digital texts formed by blocks of binary data stemming from 
digital transformations of applied to an original text, characterized in that it comprises: 

A preparatory stage consisting in modifying at least one binary data in one of 

these blocks according to at least one substitution operation consisting of the extraction 

of this binary data in a block and its replacement by a decoy, 
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A transmission stage: 

i. Of a modified compressed digital text in conformity with the 
format of the original compressed digital text, constituted by blocks modified 
during the course of the preparatory stage, and 

ii. By a separate path of this modified compressed digital text, of 
digital complementary information permitting the reconstitution of the original 
compressed digital text fi-om the calculation on the equipment of the addressee as 
a function of this modified compressed digital text and of this complementary 
information. 

[0022] In an embodiment this binary data represents an entry into a coding table and the 
decoy represents a different entry into this coding table. 

[0023] In another embodiment the coding table is constructed in a dynamic manner during 
the decoding. 

[0024] The coding table is advantageously predefined by a given standard or a given norm. 
[0025] This binary data advantageously represents a prior position in the digital text 
generated in the course of the decoding and the decoy represents a different prior position in this 
digital text generated in the course of the decoding. 

[0026] In an embodiment the modified compressed digital text 5 is in conformity with the 
standard of the original compressed digital text 1. 

[0027] In another embodiment the modified compressed digital text 5 is in conformity with 
the format of the original compressed digital text 1. 

[0028] In an embodiment this binary data and this decoy have the same size. 
[0029] In another embodiment this binary data and this decoy have different sizes. 
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[0030] The series of binary data is preferably coded differentially. 

[0031] In a variant the modified compressed digital text has the same size as the original 
compressed digital text. 

[0032] Li another variant the modified compressed digital text has a size different than that 
of the original compressed digital text. 

[0033] The compressed digital text reconstituted from the modified compressed digital text is 
preferably strictly identical to the original compressed digital text. 

[0034] The process is advantageously applied to compressed digital texts stemming from the 
LZW compression format. 

[0035] The process is advantageously applied to compressed digital texts stemming from the 
ZLIB/DEFLATE compression format. 

[0036] The process is advantageously applied to compressed digital texts stemming from the 
Adobe PDF format. 

[0037] The process is advantageously applied to compressed digital images stemming from 
the TIFF format. 

[0038] The process is advantageously applied to compressed digital images stemming from 
the GIF format. 

[0039] The present invention also relates to a system for implementing the process, 
comprising at least one server containing original compressed digital texts and comprising an 
apparatus for analyzing the compressed digital text, an apparatus for separating the original 
compressed digital text into a modified compressed digital text and into complementary 
information as a function of this analysis, at least one telecommunication network for the 
transmission and at least one apparatus in the equipment of the addressee for the recomposition 
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of the original compressed digital text as a function of this modified compressed digital text and 
of this complementary information. 

[0040] The present invention will be better understood with the aid of the following 
exemplary embodiments concerning statistical compression algorithms of digital texts well 
known to an expert in the art. 

[0041] The LZW (Lempel-Ziv-Welch) compression algorithm is a statistical compression 
algorithm that can be adapted to a variable length and that has been adapted in particular as a 
compression standard in the TIFF (Tag Image File Format), GIF (Graphics Interchange Format) 
or Adobe PDF (Portable Document Format) standards. The LZW algorithm also compresses 
binary data (byte stream) as well as visual data (pixel stream) or also the data of a digital text. 
[0042] The data stemming from the LZW compression algorithm consists of a sequence of 
codes that have a length comprised between 9 and 12 bits. Each code represents either a simple 
character (that is, a byte comprised between 0 and 255), a table re-initialization marker (value 
256), an "end of data" marker (value 257) or also an entry into a table (value > 258), which entry 
is associated with a sequence of bytes found previously in the digital text to be compressed. 
Initially and in the encoding as well as in the decoding the codes have a length of 9 bits (value 
comprised between 0 and 257) and the table is initialized with the 258 first entries (the 256 
values of a byte + the re-initialization marker 256 + the end of data marker 257). As the 
encoding (or decoding) process progresses, new codes are added to the table, each associated 
with sequences of bytes with variable lengths that can appear in a recurrent manner in the digital 
text to be compressed (or decompressed). Each time that a byte sequence that has already 
appeared reappears in the digital text the code corresponding to the entry of the table storing this 
same sequence is sent to the compressed digital text. Likewise, during decompression the codes 
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are systematically replaced by the byte sequence read at the corresponding entry of the table and 
a new entry is added to the table in order to store the sequence formed from the previously 
decoded sequence. Thus, the table is constructed dynamically in the same manner in the 
encoding as in the decoding. When the binary length of the codes is no longer sufficient for 
representing an entry in the table it is increased by 1: Thus, as soon as the number of entries in 
the table reaches 510 the codes are coded on 10 bits (and in the same manner when the entry 
number reaches 1022 (1 1 bits) and 2046 (12 bits). However, the codes never exceed a length of 
12 bits (4095 entries maximum). Li a compressed digital text the code 256 can appear several 
times: The table is then re-initialized and the binary length of the codes re-initialized to 9. 
[0043] During the protection operation of a compressed digital text stemming from the LZW 
algorithm, an algorithm reads the byte stream and dynamically constructs the table in the same 
manner as an LZW decompression algorithm. 

[0044] The protection operation of an LZW compressed digital text consists in extracting in 
a random and/or deterministic manner in the sequence one or several (this number is determined 
in a random or calculated manner) original codes and in replacing them by one or several valid 
"decoy" codes, which valid "decoy" codes point to entries in the table. A "decoy" code is called 
valid when the new pointed entry in the table exists and then this entry corresponds to a sequence 
of bytes with a length identical to that pointed by the original code. 

[0045] The digital text decompressed from the modified compressed digital text has the same 
size as the original digital text. The text displayed from the modified compressed digital text 
consists of a random succession of alphabetical characters and punctuation signs that is not 
inteUigible to human being. 



9 



[0046] The Adobe PDF (Portable Document Format) foraiat uses the LZW statistical 
compression algorithm for compressing objects of the digital text type in a docimient encoded in 
the PDF format. An object of the digital text type represents a paragraph, one or several pages of 
text, the legend of a figure. Each object of the digital text typed is coded in an independent 
manner. Thus, the present invention allows the protection of certain digital texts in order to 
render the displayed text illegible and/or incomprehensible while leaving other texts objects in 
the same PDF document readable and comprehensible. 

[0047] The present invention permits the protection of objects of the figure type and of the 

digital image type incorporated in a text document and stemming fi-om a statistical compression 

algorithm by making them incoherent fi^om the viewpoint of human visual perception while 

leaving text objects in the same PDF docimient readable and comprehensible. 

[0048] The present invention advantageously permits the protection of digital images in the 

TIFF and GIF formats by making them incoherent fi*om the viewpoint of human visual 

perception. 

[0049] The zlib/deflate compression algorithm is a combination of two statistical 
compression algorithms: Huffinan and LZ77 (Lempel-Ziv 77). It is used especially for 
compressing objects of the digital text and/or digital image and/or figure type in the Adobe PDF 
format. 

[0050] The Huffinan algorithm consists in replacing a succession of symbols in an original 
stream and stemming fi^om a certain alphabet by a series of codes with variable lengths, each 
code substituting a symbol in the compressed stream. The algorithm begins by analyzing the 
number and the fi-equency of the symbols appearing in the original stream in order to construct a 
coding tree fi-om which it associates each encountered symbol with a code with a length 
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inversely proportional to the frequency of the appearance of the symbol in the original stream. 
The compression then consists in replacing each symbol with its associated code. However, the 
decompression algorithm needs the coding tree in order to decompress the compressed stream. 
However, a modified version of Huffinan is used for the zlib/deflate algorithm: The coding tree 
is constructed respecting the supplementary rules that confer a property of unicity to it and the 
decompression algorithm no longer needs the coding tree but only the lengths of the codes used 
in order to reconstruct the latter. 

[0051] Algorithm LZ77 identifiers the sequences of recurrent data in a stream in a sliding 
window of fixed size. When a sequence that has already appeared is detected again it is replaced 
in the compressed stream by two numbers: A distance d and a length 1. The distance indicates 
at which location in the window this same sequence begins and the length indicates how much 
data the identified sequence comprises. During the decompression, each time the algorithm 
encounters a couple (d, 1) it recopies in the exiting stream the sequence of data with length 1 
read from the current position less d. 

[0052] The zlib/deflate compression algorithm uses three compression modes: A "no 
compression" mode for the data that has already been compressed, a classic LZ77 + Huffinan 
mode with the coding trees defined in the specifications of the algorithm, and a modified LZ77 + 
Huffinan mode. The data is cut into blocks with each block being coded independently 
according to one of the three previously cited modes. 

[0053] In modes 2 and 3 the data is first coded according to LZ77 and a sequence of symbols 
is thus generated, which symbols are of the "character" (i.e., a byte whose value is comprised 
between 0 and 255) type or distance-length (d, 1) couple type. This symbol sequence is then 
compressed with a classic Huffinan algorithm (mode 2) or a modified Huffinan (mode 3). 



11 



[0054] In conformity with the invention the operation for protecting a compressed digital text 
according to the zlib/deflate algorithm consists in modifying one or several blocks coded 
according to modes 2 or 3. The modifications consist in extracting from the compressed digital 
text a Huffinan code coding a symbol of the "character" or distance d type and replacing it with a 
valid Huffman code. A HufS&nan code is called valid if it has the same length as the code that it 
replaces and if it corresponds effectively to a coded symbol of the same type, that is, character or 
distance. 

[0055] A modified zlib/deflate compressed digital text has the same size as the original 
zlib/deflate compressed digital text. Likewise, the decompressed digital text from the modified 
compressed digital text has the same size as the original digital text. 

[0056] The displaying of the modified digital text produces a text that is illegible and/or 
incomprehensible for a human being because it displays a succession of characters and 
punctuation signs with no logic. 

[0057] The specifications of the zUb format define a field of 4 bytes ADLER32 located at the 
end of the compressed digital texts: This field stores a unique identifier of the original digital 
text and it is used during the decompression in order to verify the integrity of the digital text. In 
the case of a zlib/deflate digital text compressed and modified according to our invention the 
signature of the decompressed digital text will not be identical to that of the original digital text. 
[0058] The original signature is advantageously updated during the application of the 
protection. 

[0059] The invention will be better understood with the aid of the description, given below 
purely by way of explanation, of an embodiment of the invention with reference made to 
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attached figure 1, that illustrates a particular embodiment of the system permitting the protection 
and distribution in a secure manner of compressed digital texts in accordance with the invention. 
[0060] Compressed digital text 1 to be secured is passed via link 2 to analysis and protection 
module 3 that generates a modified compressed digital text 5 in a format identical to original 
compressed digital text 1 except that certain binary data have been replaced by values different 
than the original ones, and is stored in server 6. The complementary information 4 of any format 
is also placed in server 6 and contains information relative to the data of the compressed digital 
text that was modified, replaced, substituted or shifted, and to its values or emplacements in the 
original compressed digital text. 

[0061] Protected compressed digital text 5 in a format identical to the original compressed 
digital text is advantageously then transmitted via a high throughput network 9 of the 
microwave, cable, satellite type or another network to the terminal of the user 8 and more 
precisely into a memory 10. 

[0062] When user 8 requests to display text present in memory 10, two possibilities are 
possible: Either user 8 does not have all the rights necessary to exploit the compressed digital 
text, in which case the modified compressed digital text 5 generated by protection module 3 and. 
present in memory 10 is passed to synthesis system 13 via reading buffer memory 11, that does 
not modify it and transmits it identically to a display device capable of decoding it 14 and its 
contents, degraded by protection module 3 and incomprehensible fi*om a semantic viewpoint, and 
is displayed on viewing screen 15. Modified compressed digital text 5 generated by protection 
module 3 is advantageously passed directly via network 9 to reading buffer memory 1 1 then to 
synthesis module 13. 
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[0063] Or, server 6 decides that user 8 has the rights to correctly display the compressed 
digital text.. In this case synthesis module 13 makes a display request to server 6 containing 
complementary information 4 necessary for the recomposition of the original compressed digital 
text 1. Server 6 then sends complementary information 4 via telecommunication network 7 of 
the analog or digital telephone line type, DSL (Digital Subscriber Line) or BLR (Loop Local 
Radio) type, via DAB networks (Digital Audio Broadcasting) or via digital mobile 
telecommunication networks (GSM, GPRS, UMTS) 7, which complementary information 
permits the reconstitution of the compressed digital text in such a manner that user 8 can store it 
in buffer memory 12. Synthesis module 13 then proceeds to the reconstitution of the original 
compressed digital text from the modified compressed digital text that it reads in its reading 
buffer memory 11, and modified fields, whose positions it knows, as well as the original values 
are restored by virtue of the content of the complementary information read in recomposition 
buffer memory 12, Complementary information 4 that is sent to the recomposition module is 
specific for each user and is a fimction of his rights, e.g., single or multiple use, the right to make 
one or several private copies, late or early payment. 

[0064] Modified compressed digital text 5 is passed directly via network 9 to reading buffer 
memory 1 1 then to synthesis module 13. 

[0065] Modified compressed digital text 5 is advantageously recorded on a physical support 
like a disk of the CD-ROM or DVD type, a hard disk or a flash memory card. Modified 
compressed digital text 5 is then read from physical support 9bis by disk reader lObis of box 8 in 
order to be transmitted to reading buffer memory 1 1, then to recomposition module 13. 
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[0066] Complementary infomiation 4 is advantageously recorded on a physical support This 
with a credit card format constituted by a smart card or a flash memory card. This card This is 
then read by module 12 of the apparatus 8 comprising a card reader 7ter. 
[0067] Card 7bis advantageously contains the applications and the algorithms that will be 
executed by recomposition module 13. 

[0068] Apparatus 8 is advantageously an autonomous, portable and mobile apparatus. 
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